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SECTION  I 
INTRODUCTION 


This  report  describes  a study  that  has  as  its  objective  the  development 
of  an  automatic  speaker  verification  system  that  can  be  used  over  a degraded 
communication  channel  such  as  a telephone  line.  A typical  telephone  line  is 
known  to  introduce  band- 1 im i t i ng , phase  distortion,  and  noise  onto  the  signal 
being  transmitted.  The  problem  then  is  to  compensate  the  method  of  speaker 
verification  to  minimize  the  effects  of  these  distortions  on  the  system. 

The  methods  used  in  this  study  to  compensate  for  the  channel  conditions  of  a 
simulated  telephone  line  resulted  in  a 99%  acceptance  rate  of  true  speakers 
and  a 99%  rejection  rate  of  impostors  for  a limited-test  data  set. 

Texas  Instruments  has  conducted  research  on  automatic  speaker  verifica- 
tion for  a number  of  years  and  has  succeeded  in  moving  the  technology  out  of 
the  laboratory  and  into  a working  environment.  For  example,  an  automatic 
entry  control  system  has  been  in  operation  in  the  Corporate  Information  Center 
at  Texas  Instruments  for  the  past  two  years,  and  a similar  system  is  being 
evaluated  as  part  of  a base  installation  and  security  system  (BISS).  These 
systems  are  in  a relatively  noise- free  environment  and  utilize  good  communica- 
tion lines  and  dynamic  microphones.  However,  there  are  various  military  and 
commercial  applications,  such  as  credit  card  validation,  which  require 
speaker  verification  from  a remote  location.  The  obvious  solution  to  this 
problem  is  to  use  the  readily  available  Leiephone  line  for  communication.  The 
purpose  then  of  the  current  study  was  to  determine  the  degradation  in  perfor- 
mance resulting  from  using  the  telephone  line  and  to  develop  methods  of  improv- 
ing performance  to  the  point  where  it  was  comparable  with  the  nondegraded 
channe 1 . 

For  basically  the  same  reasons  that  the  telephone  line  is  attractive 
as  a communication  channel,  the  ready  availability  and  economy  of  the  tele- 
phone headset  make  it  attractive  as  a microphone.  These  advantages  must  be 
weighed  against  the  additional  distortion  obtained  from  the  carbon  micro- 
phone whose  characteristics  can  change  according  to  the  physical  abuse 
involved.  It  is  not  unrealistic  to  consider  replacing  the  carbon  microphone 
with  a better  quality  microphone. 
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extent 


The  problems  of  the  telephone  headset  were  not  addressed  to  any 
in  this  study  except  to  perform  a very  limited  test  over  several  telephones 
with  one  speaker.  The  methods  of  compensation  for  the  degraded  channel  used 
in  this  study  were  also  restricted  to  techniques  which  would  work  with  the 
current  speaker  verification  system. 


Section  II  of  this  report  will  describe  the  current  speaker  verification 
system  being  used  and  will  describe  the  time  registration  and  verification 
procedures.  Section  III  will  describe  the  results  of  a telephone  survey 
conducted  by  the  Bell  Telephone  Laboratories  in  which  they  characterized  the 
telephone  lines  with  respect  to  attenuation  distortion,  delay  distortion, 
and  the  s i gna I -to-no i se  ratio.  Section  IV  will  describe  a preliminary  evalua- 
tion of  the  current  speaker  verification  system  on  a simulated  telephone 
channel.  Section  V wi 1 1 describe  the  methods  used  for  compensation  in  the 
speaker  verification  system  when  used  over  a degraded  channel.  This  section 
will  also  discuss  several  other  methods  that  could  be  used,  along  with  their 
advantages  and  disadvantages.  Section  VI  will  describe  an  experiment  con- 
ducted to  determine  the  degradation  of  performance  due  to  the  communication 
channel  and  to  determine  the  performance  of  the  system  when  compensation 
techniques  are  used.  A simulated  telephone  line  will  be  characterized  which 
was  obtained  by  using  the  RADC-DICEF  facility.  Section  VII  will  give  the 
results  of  this  experiment.  Section  VIII  will  summarize  the  study  and  offer 
recommendations  for  further  study. 
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SECTION  II 
SPEECH  PROCESS ING 


The  speech  processing  strategy  at  Texas  Instruments  has  led  an  evolu- 
tionary life  from  the  first  methods  used,  although  they  are  all  based  on  the 
relative  spectrum  of  the  speech  as  a function  of  time.  This  section  will 
describe  the  processing  strategy  used  durinq  this  study.  A comparison  of  the 
processing  used  for  the  BISS  speaker  verification  system  with  that  given  here 
for  the  remote  terminal  study  is  presented  in  the  appendix. 

A.  Preprocessing 

1 . Filter  Bank  Definition 

The  spectrum  is  obtained  by  processing  the  speech  signal  through 
an  analog  filter  bank  preceded  by  a high  frequency  preemphasis  network.  The 
filter  bank  consists  of  16  bandpass  filters,  each  followed  by  a fullwave 
rectifier  and  a four-pole  lowpass  Bessel  filter  with  a 3 dB  cutoff  at  30  Hz. 
Each  of  the  16  filters  is  sampled  100  times  per  second.  The  analog  filter 
characteristics  are  given  in  Table  1. 


For  processing,  the  top  three  filters  are  averaged  and  filter  14 
replaced  by  this  average.  Filters  15  and  16  are  set  to  zero.  The  resulting 
14  filter  outputs  at  each  time  sample  are  represented  by  the  spectrum  ampli- 
tude vector: 


a, . 
/ 'J 

,ai(V  t 

* — 1 
CM 
CD 

a2(V 

1 • 1 

1 

• 1 

I 

al4,j 

al4(tj) 

2 . Regress  ion 

It  has  been  found  that  by  eliminating  the  gross  aspects  of  the 
spectrum,  such  as  the  slope  and  curvature,  more  clearly  defined  format 
frequencies  are  obtained.'  Therefore,  the  spectrum  amplitude  vector  is 


3 


i 


TABLE  1.  CHARACTERISTICS  OF  16-CHANNEL  FILTER  BANK 


Center  Frequency 
(H z} 

350 

450 

555 

670 

790 

940 

1120 

1320 

1550 

1810 

2100 

2420 

2800 

3200 

3800 

5000 


Bandwidth 
(Hz,  at  -6  dB) 

300 

300 

310 

340 

380 

400 

400 

400 

400 

400 

400 

400 

400 

400 

800 

1600 
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regressed  by  the  first  three  elements  of  an  orthonormal  basis  set: 


(A,)n  = A.  - L c . . F 


J R j 
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fi2  = " cos  t nl 


1,  2,  ....  14 


and 


1 '4 
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j k 14  , mj  mk 
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3.  Norma  1 i zat ion 

The  regressed  amplitude  vector  is  then  normalized  by  a modified 

postregression  standard  deviation  for  time  t. : 

J 

1 


where 
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post. 


a . a . 


mj  mj 


If  0 /a  < R . , then  a /a  = R . . The  resulting  normalized 
postj  pre-  min  post.  pre.  min 

amplitude  vector  is  then:  -1  J 


<A,)m  - ^ (A,) 


j N 


j R • 


The  a at  time  t.  is  used  as  the  energy  measure.  The  energy  (a  ) 
post  j v postj- 

and  the  coefficients  c. , and  c.„  are  also  normalized  by  a.. 

J1  J2  J 

4.  Quant izat ion 


The  regressed  and  normalized  amplitude  vector  is  then  quantized  to 
one  of  eight  levels  according  to  a set  of  quantization  thresholds  1$.^}: 


(a  i j ) Q = ^ ,FF 


1 (a. . ) 2 

ij  N iq 


(a i j * ®i,q-l  f°r  1 = ' 7 


where  <0.  , ; 0 

■ " i , q+ 1 


= -<*>;  and  ( J - = The  f<J.  } were  chosen  so  that 
10  io  1 ° 


q l f q+ I ' 10  ' iO  *' iq 

the  probability  that  (a. . )^  = q is  1/8  for  all  q.  The  regression  coefficients 

are  quantized  in  the  same  manner  as  the  amplitude  vector.  A linear  quantiza- 
tion is  used  for  the  energy. 


B.  Processing 

A detailed  description  of  the  processing  is  given  in  the  Speaker  Verifica- 
tion III  report,*  pages  13  and  14.  Briefly,  the  processing  strategy  is  to 
first  accurately  time  register  critical  points  of  the  spectrum  or  points  of 
greatest  spectral  change.  This  is  accomplished  by  defining  scanning  patterns 
from  enrollment  data.  These  scanning  patterns  are  scanned  across  the  input 
data  to  find  an  optimum  sequence  of  registration  points.  This  can  be  con- 
sidered a zeroeth  order  time  warping  to  time  align  various  points  of  the  input 
and  reference  data.  Recognition  patterns  are  then  formatted  between  two  or 
more  time  registration  points  which  are  then  used  for  verification.  This  is 
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equivalent  to  a first-order  time  warping  in  that  it  helps  account  for  changes 
in  the  length  of  words.  A description  of  the  scanning  pattern  definition, 
optimal  sequence  strategy,  recognition  pattern  definition  and  the  verification 
strategy  will  now  be  presented. 


1 .  Scanning  Pattern  Definition 

The  scanning  patterns  are  formed  from  the  spectral  data  and  are 
used  to  time  align  the  input  data  with  the  reference  data.  The  scanning 
patterns  are  formed  from  the  scanning  spectrum:  the  scanning  pattern  formed 
at  time  t^  consisting  of  (a)  the  average  of  spectral  data  at  times  t^  j and 
tj-2*  (k)  the  average  of  spectral  data  at  times  t.  + j and  t.+2»  and  (c)  the 
difference  (b)  - (a).  Figure  1 illustrates  the  formation  of  a scanning 
pattern  from  the  spectral  data. 


2 .  Scanning  Error  Computation 

The  scanning  patterns  are  used  to  determine  the  location  of  the  time 
registration  points  or  reference  points  in  the  input  speech  on  a phrase  basis. 
At  each  time  sample  a scanning  pattern  is  formatted  from  the  input  speech  and 
compared  with  each  of  the  reference  scanning  patterns  for  the  words  in  the 
phrase.  This  comparison  is  done  using  the  squared  error  between  the  k'th 
reference  scanning  pattern  and  the  scanning  pattern  defined  at  the  j'th  time 
sample  of  the  input  data: 


3.  Valley  Finding 

Using  the  scanning  errors  as  a function  of  time,  an  error  function 
is  thus  generated  for  each  of  the  reference  scanning  patterns.  Each  function 
is  monitored  for  dips  of  sufficient  magnitude  to  be  considered  as  potential 
locations  of  the  corresponding  reference  points  in  the  input  data.  These  dips 
are  called  valley  points  when  the  ratio  of  the  local  maximum  (peak)  and  the 
dip  is  greater  than  or  equal  to  a specified  peak- to-val ley  (PV)  ratio,  which 
in  this  case  is  1.3,  and  the  magnitude  of  the  valley  point  is  less  than  or 
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Figure  1.*  Example  Scanning  Pattern  Formation 
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Center  Point 
of  Scanning 
Pattern 
( t i me  = t . ) 


equal  to  200.  A peak  of  (l/PV-rat io)  * (valley  point  error)  is  required 
before  another  valley  point  can  be  found. 


4.  Reference  Point  Sequencing 

The  next  step  is  to  pick  a valley  point  from  each  reference  point 
in  the  phrase  to  fit  together  a sequence  of  eight  reference  points  for  the 
phrase.  A sequence  is  determined  for  all  combinations  of  valley  points  which 
satisfy  the  phrase  specific  minimum  and  maximum  time  distance  restrictions 
between  each  pair  of  reference  points.  An  error  is  determined  for  each 
reference  point  pair  that  is  based  on  the  scanning  errors,  erp  , of  the 
points  and  the  expected  time  distance  between  the  two  points.  The  point 
pair  error  for  reference  points  k and  k + 1 is  given  by: 


= (e 


rP,. 


e . ) (e 
min  rp 


+ e 


k+1 


,in>( 


i + e 


At-<  ~ 

At, 


where  em-n  = 40,  At^  is  the  distance  between  the  points,  At^  is  the  expected 
distance  between  the  points,  and  0 (=  1)  is  a penalty  assigned  for  deviations 
from  the  expected  distance.  These  point  pair  errors  are  summed  for  all  refer- 
ence point  pairs  to  obtain  a sequence  error  for  the  phrase.  This  sequence 
error  is  limited  by  a maximum  threshold  of  100.  The  optimal  sequence  of  eight 
reference  points  for  the  phrase  is  the  combination  of  valley  points  with  the 
minimum  sequence  error. 


5 . Verification  Error 

To  obtain  the  verification  error  for  a phrase,  a recognition 
pattern  is  formatted  between  the  time  registration  points  of  each  word  in 
the  phrase.  A sample  recognition  pattern  is  illustrated  in  Figure  2,  where 
it  is  shown  that  the  pattern  is  interpolated  between  the  two  reference  points, 
which  is  in  contrast  to  the  BISS  verification  where  the  scanning  error  is  used 
for  verification. 


The  resulting  verification  patterns  are  compared  with  reference 
patterns,  and  a squared-error  sum  is  obtained  between  the  two  patterns. 
This  is  called  the  spectral  error  for  the  word;  when  the  four  word  errors 
are  added,  it  becomes  the  spectral  error  for  the  phrase. 
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Figure  2.  Example  Recognition  Pattern  Formation 
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SECTION  III 

TELEPHONE  CHANNEL  CHARACTER  1ST  ICS 


Bell  Telephone  Laboratories  has  conducted  several  system-wide  trans- 
mission surveys  since  1959,  the  latest  being  a 1969-1970  survey  conducted 

2 

by  Duffy  and  Thatcher.  The  survey  separated  the  results  into  three  mileage 
categories:  short  (0  - 180  miles),  medium  (180  - 725  miles),  and  long 

(725  - 2900  miles).  Of  the  transmission  characteristics  discussed  in  the 
survey;  noise,  loss,  attenuation  distortion,  and  envelope  delay  distortion 
are  the  most  important  to  speaker  verification.  The  noise,  attenuation 
distortion,  and  envelope  delay  distortion  will  be  discussed  in  more  detail 
in  the  next  three  sections.  The  loss  can  be  compensated  by  using  an  amplifier 
and  should  cause  no  problem  to  speaker  verification  processing  provided  the 
response  of  the  amplifier  is  flat.  In  a series  of  trials  in  the  laboratory 
it  was  found  that  the  loss  was  greater  over  the  lines  that  went  through  a 
"dial  9"  or  outside  line  as  opposed  to  an  inside  line  or  Centrex.  The  level 
of  the  signal  can  be  compensated  by  an  automatic  gain  control. 


A.  Noise 

The  noise  study  was  broken  into  circuit  noise  and  impulse  noise.  The 
impulse  noise  results  are  given  in  Table  2 and  represent  the  number  of  noise 
impulses  exceeding  four  different  voltage  levels  received  when  a 2750  Hz 
signal  at  -12  dBm  is  transmitted. 


The  circuit  noise  results  are  given  graphically  in  Figure  3 as  the  ratio 
of  received  signal  power  to  C-notched  noise.  Note  that  the  average  signal- 
to-noise  ratio  for  all  mileage  categories  is  41  dB.  Also,  Figure  3 shows 
that  less  than  1%  of  all  the  lines  have  a s i gna 1 -to-no i se  ratio  less  than 
20  dB. 

B.  Attenuation  Distortion 

Attenuation  distortion  is  a measure  of  the  variation  in  loss  caused  by 
a change  in  frequency  of  the  transmitted  signal.  The  results  of  the 


Connect  ion 
Length 

(Airline  Miles) 


All 

40.6  ± 3.0 

1 1.8 

0-180 

42.1  ± 3.8 

13.0 

180-725 

36.5  ± 1.3 

5.3 

725-2900 

35.4  ± 0.9 

3.8 

attenuation  distortion  study  are  shown  in  Table  3 where  the  mean  and  standard 
deviation  are  given  as  a function  of  the  transmitting  frequency.  The  locus 
of  the  mean  is  plotted  in  Figure  4.  The  average  attenuation  distortion  is 
less  than  3 dB  between  approximately  500  Hz  and  2700  Hz.  Beyond  these  ranges, 
the  attenuation  distortion  increased  rapidly. 

C.  Envelope  Delay  Distortion 

Envelope  delay  is  the  derivative  of  the  phase  characteristic  with 
respect  to  frequency,  and  the  distortion  is  the  envelope  delay  minus  a con- 
stant delay  term  for  a particular  frequency,  in  this  case  the  delay  at 
1700  Hz.  The  mean  and  standard  deviation  of  the  envelope  delay  distortion 
are  given  in  Table  4 for  various  frequencies.  The  locus  of  the  means  of  the 
envelope  delay  distortion  is  plotted  in  Figure  5.  Again  it  can  be  seen  that 
the  distortion  increases  rapidly  at  the  lower  and  higher  frequencies.  Since 
the  spectrum  is  averaged  over  a 10  ms  time  period,  the  phase  distortion  is 
not  thought  to  be  especially  damaging  to  the  verification  procedures. 
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SECTION  IV 

PRELIMINARY  EVALUATION 


As  the  basis  for  evaluation  of  the  effects  of  a telephone  channel  on 
the  current  speaker  verification  technology,  a very  limited  experiment  was 
conducted.  The  test  data  was  run  through  the  RADC-DICEF  facility  to  intro- 
duce noise  and  channel  coloration  typical  of  a telephone  channel  and  the 
performance  was  evaluated. 

A.  Data  Set 

A limited  subset  of  the  BISS  Mitre  analog  data  base  and  the  BISS  Phase  I 

3 

data  set  collected  at  Texas  Instruments  was  used. 

Mitre  Data  Set 

(1)  No.  432  (male)  - enrollment;  2 post  enrollment  sessions;  I execu- 
tion session 

(2)  No.  5088  (male)  - impostor  trial 

(3)  No.  3387  (male)  - impostor  trial 

(4)  No.  1368  (male)  - impostor  trial 

(5)  No.  9062  (male)  - impostor  trial 

(6)  No.  3263  (male)  - impostor  trial 

Phase  I Data  Set 

(7)  Dan  Daniel  (male)  - enrollment;  4 post  enrollment  sessions;  4 
execution  sessions. 

B.  Simulated  Telephone  Lines 

The  above  analog  data  set  was  processed  through  the  RADC-DICEF  facility 
to  introduce  noise,  a flat  line,  and  a 4A  line  coloration.  The  flat  line 
had  a virtually  flat  response.  The  amplitude  distortion  of  the  4A  line  or 
simulated  telephone  line  is  shown  in  Figure  6.  The  envelope  delay  distor- 
tion is  shown  in  Figure  7-  As  can  be  seen  by  comparing  these  figures  with 

Figures  4 and  5,  they  simulate  the  average  telephone  line  reasonably  well. 
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The  channel  conditions  considered  for  the  experiment  are: 


(1) 

Or  i g 

i na  1 

tape 

(2) 

4 kHz  lowpass 

filter 

(3) 

Flat 

1 ine 

; no 

noise  added 

(4) 

Flat 

1 i ne 

; 10 

dB  S/N 

(5) 

Flat 

) i ne 

; 20 

dB  S/N 

(6) 

Flat 

1 i ne 

; 30 

dB  S/N 

(7) 

4a  1 

i ne ; 

no  no 

ise  added 

(8) 

4a  1 

i ne ; 

10  dB 

S/N 

(9) 

4a  1 

ine; 

20  dB 

S/N 

(10) 

4a  1 

i ne ; 

30  dB 

S/N 

C.  Experiment 


To  evaluate  the  effect  of  the  degraded  channel  conditions  on  the  speaker 
verification  algorithm,  various  channel  conditions  were  used  for  enrollment 
and  then  execution.  For  Type  I errors,  (rejection  of  true  speaker),  speaker 
No.  432  and  Dan  Daniel  were  used  in  the  following  combinations: 


Type  I 

Enrol  led  with 
Channel  Condition 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

For  the  Type  II  errors  (acceptance 
of  the  Mitre  data  set  were  used  against 
channel  conditions  used  were: 


Execut ion  with 
Channe 1 Cond i t ion 

1 - 10 

2 

I - 10 

4 

5 

6 

1 - 10 
8 

1 - 10 
10 

of  imposters),  speakers  2 through  6 
the  true  speaker  (No.  432).  The 


2 I 


Type  II 

(No.  432)  Enrolled  with  Imposter  Trials  of  2 - 6 with 

Channel  Condition  I for  Channel  Conditions 

I 1.  3,  7,  9 

3 I,  3,  7,  9 

7 I,  3.  7,  9 

9 1,  3,  7,  9 

0.  Results 

The  results  of  the  experiment  are  shown  in  Table  5.  A spectral  error 
threshold  of  200  was  chosen  and  Type  I and  Type  II  errors  calculated.  A 
Type  I error  occurs  when  the  spectral  error  for  the  true  speaker  is  greater 
than  the  threshold,  and  a Type  II  error  occurs  when  the  impostor  spectral 
error  is  less  than  or  equal  to  the  threshold.  Because  of  the  limited  nature 
of  this  experiment,  the  results  have  very  little  meaning  except  in  a quali- 
tative sense.  For  example,  Table  5 shows  that  great  difficulty  was  encoun- 
tered in  registering  reference  points  for  the  true  speakers  when  the  execu- 
tion was  on  a different  channel  condition  than  the  enrollment.  In  the  next 
section  compensation  methods  will  be  discussed  with  which  these  phrases  can 
be  registered. 
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TABLE  5.  SPEAKER  PERFORMANCE  IN  PRELIMINARY  EXPERIMENT 
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Number  of  phrases  with  spectral  error  greater  than  200 
Number  of  phrases  with  spectral  error  less  than  or  equal  to  200 
No  sequences  were  registered 


SECTION  V 

CHANNEL  COMPENSATION  TECHNIQUES 


A variety  of  philosophies  can  be  employed  to  compensate  for  the  channel 
coloration  and  noise  introduced  by  a telephone  channel.  These  would  include: 

(1)  Replacing  the  channel  with  a twisted  pair  so  as  to  eliminate  the 
degrading  effects  of  the  channel.  Obviously,  this  would  not  be  practical  in 
all  cases,  such  as  between  distant  locations. 

(2)  Eliminating  the  channel  by  sending  a digital  signal  over  the  tele- 
phone line  instead  of  the  analog  voice  data.  This  would  involve  placing  a 
filter  bank  and  preprocessor  at  each  sending  location. 

(3)  Calibrating  the  channel  so  as  to  weight  various  parts  of  the  spec- 
trum according  to  the  amplitude  distortion  of  the  telephone  line  being  used. 
This  would  involve  some  sort  of  signal  generator  at  each  sending  location  and 
an  additional  piece  of  equipment  at  the  receiving  location.  An  ultimate 
loss  of  spectral  information  would  result  due  to  the  severe  amplitude  dis- 
tortion outside  of  the  approximate  spectral  range  of  500  to  2700  Hz.  Another 
way  to  calibrate  the  channel  would  be  to  use  a channel  equalizer  to  flatten 
the  spectrum.  This  method  usually  results  in  a considerable  amount  of  noise 
being  added  to  the  channel  outside  the  frequency  range  of  500  to  2700  Hz, 

(4)  Normalizing  the  data  and  expanding  the  speech  measures.  This  might 
involve  normalizing  the  data  with  the  noise  energy  during  quiescent  periods 
and  also  using  a band-limited  spectrum  corresponding  to  the  narrow  band  of 
the  telephone  line  for  speaker  processing.  This  would  involve  a change  in 
the  speech  processing  software  at  the  receiving  location.  Expansion  of  the 
speech  measures  might  include  a channel  resistant  speech  measure  such  as 
pitch  period.  This  would  involve  a piece  of  hardware  and  some  additional 
software  at  the  receiving  location. 

Most  of  the  procedures  mentioned  would  involve  an  additional  piece  of 
equipment  at  the  sending  location.  This  may  be  good  or  bad,  depending  on 
the  particular  application  of  the  remote  terminal  speaker  verification 
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system  and  the  ultimate  cost  of  the  piece  of  equipment.  Some  combination  of 
the  above  procedures  may  also  be  appropriate  in  some  applications. 

The  method  of  compensation  chosen  in  the  present  study  was  the  latter 
approach  of  normalizing  the  data,  that  of  using  a band-limited  spectrum  for 
time  registration,  and  adding  the  pitch  period  for  additional  discrimination 
in  the  verification  process.  These  measures  allow  the  compensation  to  be 
accomplished  at  the  receiving  location  and  will  be  amplified  in  the  next 
few  sections. 

A.  Noise 

According  to  the  Bell  Telephone  Company  survey  of  their  telephone  lines, 
two  major  types  of  noise  can  be  expected;  circuit  noise  and  Impulse  noise. 

It  is  not  felt  that  the  impulse  noise  problem  is  particularly  severe  to 
speaker  verification  for  two  reasons:  (1)  the  spectrum  is  obtained  from  a 

filter  bank  sampled  every  10  ms,  each  filter  being  the  average  over  a 10  ms 
interval  of  the  time  series.  One  or  two  impulses  would  have  little  effect 
on  the  filter  bank  output.  (2)  A sequential  decision  strategy  can  be 
employed  for  verification  so  that  if  the  noise  affects  the  verification  on 
one  phrase,  an  additional  phrase  can  be  used. 

The  circuit  noise  is  more  bothersome  and  several  techniques  can  be  used 
to  overcome  its  effects.  First,  through  regression  and  normalization  by  the 
standard  deviation,  the  constant  noise  can  be  eliminated  and  the  effects  of 
white  noise  can  be  reduced  as  shown  in  the  speaker  verification  II  report.^ 
The  results  of  the  preliminary  evaluation  discussed  in  Section  IV  show  that 
the  spectral  errors  for  both  the  true  speaker  and  impostors  increase  with 
increasing  noise.  Therefore,  some  method  is  needed  to  normalize  the  spectral 
error  for  the  noise  level.  One  method  would  be  to  measure  the  energy  in  the 
input  signal  during  times  of  silence  by  the  speaker  and  use  this  in  a 
normalization  scheme. 

Another  method  of  minimizing  the  effects  of  noise  is  to  measure  the 
energy  in  the  input  signal  during  periods  of  silence  and  then  filter  it  out 
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of  the  signal  with  a Wiener  filter.  Unfortunately,  the  lab  speech  system 
does  not  allow  processing  the  signal  between  the  time  it  is  sampled  and 
processed  through  the  analog  filter  bank.  Therefore,  this  method  was  not 
tried. 

B.  Time  Registration 

As  can  be  seen  by  the  preliminary  evaluation,  time  registration  of  the 
true  speakers  was  a significant  problem,  particularly  on  the  cross-channel 
conditions.  Since  the  flat  lines  with  noise  added  seemed  to  give  little 
problem,  it  became  apparent  that  the  amplitude  distortion  was  preventing 
time  registration.  Therefore,  a band-limited  spectrum  was  used.  From 
Figure  6 it  can  be  seen  that  the  amplitude  of  the  4a  line  is  fairly  flat 
between  500  and  2750  Hz.  Table  1 shows  the  frequency  characteristics  of  the 
16  analog  filters.  As  mentioned  earlier,  the  top  three  channels  are  averaged 
and  used  for  filter  14.  Therefore,  the  top  four  filters  and  the  bottom  two 
filters  were  not  used,  leaving  10  filters  covering  the  frequency  range  from 
400  to  2620  Hz.  New  quantization  limits  were  obtained,  and  the  spectrum  was 
renormalized.  The  preliminary  evaluation  test  was  rerun  for  the  difficult 
time  registration  channel  conditions.  All  but  one  of  the  phrases  that  were 
not  previously  registered  were  registered  using  the  10-channel  spectrum. 

The  results  are  shown  in  Table  6,  where  the  number  of  phrases  not  registering 
a sequence  for  the  true  speakers  are  shown  using  14  filters  and  10  filters. 

The  frequency  range  of  400  to  2620  Hz  seems  also  to  be  appropriate  for 
actual  telephone  lines.  This  was  borne  out  in  a very  small  survey  using 
Dan  Daniel  (a  participant  in  the  original  BISS  Phase  I experiment)  over 
several  telephones  in  the  laboratory  where  inside  and  outside  lines  were 
used.  All  of  his  phrases  over  these  telephone  lines  were  t ime- reg i s tered 
using  the  10-channel  spectrum.  This  experiment  will  be  discussed  in  greater 
detail  in  Section  VIII. B.  That  this  method  of  compensating  for  the  amplitude 
distortion  should  also  work  rather  well  on  a typical  telephone  line  can  be 
seen  by  looking  at  Table  3.  In  the  range  of  frequencies  from  400  to  2620  Hz 
the  mean  distortion  is  less  than  2.6  dB  with  a standard  deviation  of  1.6  dB. 
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TABLE  6.  MISREG ISTERED  PHRASES  FOR  TRUE  SPEAKERS 


Dan  Daniel  has  31  total  phrases 
No.  432  has  9 total  phrases 


C.  Verification 

The  10-channel  spectrum  is  used  in  forming  the  verification  patterns  in 
addition  to  being  used  for  time  reg i s t ra t ion.  Thus,  a net  loss  of  informa- 
tion for  verification  is  suffered  when  comparing  with  the  14  channels  of 
information  available  on  a flat  line.  To  bring  the  performance  of  the  speaker 
verification  system  over  a telephone  line  up  to  that  obtained  on  a flat  line, 

an  additional  speech  measure  is  needed  for  the  verification.  Several 
4 5 

authors  ' have  shown  that  pitch  is  relatively  insensitive  to  the  bandpass 
characteristics  of  a telephone  channel,  although  it  is  more  susceptible  to 
mimics  than  other  measures,  such  as  the  spectrum.  Therefore,  a combination 
of  spectral  error  and  pitch  period  error  is  used  for  the  verification.  A 
description  of  the  pitch  extraction  method  is  discussed  in  Section  V.D. 


D.  Pi tch  Extract  ion 


The  pitch  is  extracted  only  in  certain  segments  of  a phrase,  i.e. , 
between  the  marked  vowels  of  each  word  in  the  phrase.  The  following  steps 
are  used  in  extracting  an  optimal  pitch  track: 


(1)  Obtain  pitch  period  estimates  at  each  time  step,  t;.  Pitch 
period  estimates  are  made  every  10.05  ms  (the  sampling  interval  of  the  filter 
bank)  using  the  Cepstrum.  A 201-point  (30.15  ms)  data  block  consisting  of 
the  preceding,  current,  and  next  time  frames  is  used.  The  points  of  this 
data  block  are  obtained  from  an  A/D  filter  with  a 6667  samples/second  sampling 
rate.  The  data  block  is  low-pass  filtered  and  passed  throuqh  a Hamming  win- 
dow before  a 256-point  Cepstrum  is  computed. 


(2)  Compute  candidate  pitch  period  estimates  at  each  time  step,  t.. 

A peak-finding  algorithm  is  used  to  find  all  of  the  peaks  of  sufficient 
magnitude  of  the  Cepstrum  between  52  and  202  Hz  (4.95  to  19.2  ms).  In 
particular,  the  maximum  peak  plus  all  others  greater  than  0.25  X (maximum 
peak)  are  considered.  These  peaks  are  the  candidate  pitch  period  estimates 
at  time  t.,  along  with  an  unvoiced  estimate. 

(3)  Calculate  transition  penalty  for  pitch  period  from  time  step  t.  f 

to  t..  Each  of  the  pitch-period  estimates  at  time  t.  are  assessed  a 
1 1 
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transition  penalty  for  the  transition  from  each  of  the  saved  pitch-period 
estimates  at  time  step  t f _ j . This  transition  penalty  is  based  on  the  smooth- 
ness of  the  pitch  track  and  the  magnitude  of  the  Cepstral  peaks.  The  penalty 
assessed  to  each  pitch-period  estimate  from  time  step  t.  j to  t.  is  given  by: 

Trans i t ion 


‘•i-l 
Vo i ced 
NA 


t. 

i 

Vo i ced 
Unvo i ced 


Pena  1 ty 


1 + ft  (At/t) 

P 

k, 


‘1 


Unvo i ced 


Vo i ced 


1 + PT, 


+ k„ 


where  8 = 200 

k,  = 0.08 
k2  = 0.19 
Tj  = 0.0 


t = pitch-period  estimate 
p = Cepstral  peak  value 

At  = t.  - t.  , 

i i-l 

(4)  Calculate  cumulative  pitch-period  penalty  up  to  time  step  t..  As 

the  transition  penalty  is  calculated  for  each  pitch  period  estimate  at  time 

t.  with  all  of  the  saved  pitch-period  estimates  at  time  t ,,  it  is  added 
i i-l 

to  the  cumulative  error  associated  with  the  pitch-period  estimates  at  time 
t.  j.  The  eight  current  pitch-period  estimates  with  the  lowest  cumulative 
penalty  are  saved  at  this  time  step.  These  new  cumulative  penalties  are 
saved  with  the  pitch-period  estimates  at  time  t.  along  with  back-pointers  to 
the  estimates  at  time  t.  j which  yielded  the  minimum  transition  penalties. 


(5)  Choose  the  optimal  pitch-period  trajectory.  When  the  cumulative 
penalty  has  been  calculated  for  the  last  time  sample  to  be  considered  in  the 
phrase,  the  pitch-period  trajectory  with  the  lowest  cumulative  penalty  is 
chosen  as  the  optimal  trajectory. 


(6)  Quantize  the  optimal  pitch-period  trajectory, 
optimal  pitch-period  trajectory  is  then  quantized  to  one 


The  resulting 
of  sixteen  levels  at 
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each  time  point  on  the  trajectory.  The  quantized  pitch  period  trajectory  is 
then  stored  with  the  spectrum  at  that  time  point.  The  pitch  period  at  each 
time  sample  is  treated  as  an  auxiliary  measure  and  is  used  for  verification 
exactly  as  the  spectrum.  The  use  of  the  pitch  period  in  verification  is  des- 
cribed in  Section  VI. C. 2.  The  quantization  levels  follow  in  Table  7. 


TABLE  7. 

Quantization  Level 


PITCH- PERIOD  QUANTIZATION 

Pitch- Period  Range  (ms) 

(s  I67  Hz) 


0 

T 

£ 

6.0 

1 

6.0 

< 

T 

£ 

6.45 

2 

6.45 

< 

T 

£ 

6.9 

3 

6.9 

< 

T 

£ 

7.35 

4 

7.35 

< 

T 

£ 

7.8 

5 

7.8 

< 

T 

£ 

8.25 

6 

8.25 

< 

T 

£ 

8.7 

7 

8.7 

< 

T 

£ 

9.15 

8 

9.15 

< 

T 

£ 

9.6 

9 

9.6 

< 

T 

£ 

10.05 

10 

10.05 

< 

T 

£ 

10.5 

1 1 

10.5 

< 

T 

£ 

10.95 

12 

10.95 

< 

T 

£ 

1 1 .4 

13 

1 1.4 

< 

T 

£ 

11.85 

14 

11.85 

< 

T 

£ 

12.3 

15 

T 

> 

12.3 

(<  50  Hz) 
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SECTION  VI 
EXPER  IMENT 


A limited  experiment  was  performed  for  purposes  of  comparing  the  per- 
formance of  the  compensation  techniques  over  the  original  and  degraded 
channe 1 s . 


A.  Data  Set 

The  data  set  consisted  of  a subset  of  16  speakers  from  the  BISS  Phase  I 
data  set  collected  at  Texas  Instruments.  An  enrollment  session  and  four 
execution  sessions  were  selected  for  each  speaker.  The  enrollment  session 
consisted  of  20  phrases  and  each  execution  session  contained  four  phrases. 
The  phrases  were  prompted  from  a randomly  selected  set  of  32  phrases  con- 
taining four  monosyllabic  words.  The  words  used  in  the  BISS  Phase  I set  are 
given  in  Table  8 with  the  allowable  sequences  shown  by  the  arrows.  Each 
true  speaker  was  used  as  a casual  impostor  against  the  other  15  speakers. 

B.  Degraded  Channel  Condition 


The  analog  data  set  of  16  speakers  was  processed  through  the  RADC 
Digital  Communications  Experiment  Facility  (DICEF)  to  impose  a 4A  line  charac- 
teristic to  the  data  to  simulate  a telephone  line.  The  DICEF  facility  is  a 
digital  filtering  system  that  can  introduce  amplitude  distortion  and  time 
delay  characteristics  are  shown  in  Figures  6 and  7,  respectively.  The  sam- 
pling frequency  used  in  this  experiment  was  9200  samples  per  second.  It  was 
discovered  after  the  data  had  been  processed  that  it  was  not  low-pass 
filtered  at  the  half-sample  frequency  to  prevent  aliasing.  This  apparently 
caused  some  time  registration  problems  which  will  be  discussed  in 
Section  VII. A. 

TABLE  8.  WORD  SET  FOR  VERIFICATION 
UTTERANCE  CONSTRUCTION 


Cool 


Strange  — -^*Toads 
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White  noise  was  added  to  the  data  to  simulate  a 20  dB  s igna 1 -to-no i se 

ratio.  Less  than  one  percent  of  the  telephone  lines  have  a s igna 1 -to-no i se 

2 

ratio  as  low  as  20  dB,  according  to  the  Bell  Laboratory  Survey.  This  was 
considered  a good  test  for  the  noise  resistance  of  the  speaker  verification 
strategy.  The  data  processed  through  the  4A  channel  with  white  noise  added  to 
simulate  a 20  dB  s i gna 1 -to-no i se  ratio  will  be  referred  to  as  the  degraded 
channe 1 . 

C.  Enrol Iment 


The  enrollment  of  the  speakers  means  that  an  average  set  of 
scanning  patterns  and  recognition  patterns  are  obtained  for  each 
words.  There  are  20  phrases  of  enrollment  data  selected  so  that 
is  repeated  five  times.  Therefore,  the  average  pattern  for  each 
from  five  separate  patterns. 


reference 
of  the  16 
each  word 
word  is 


1 . Scanning  Patterns 

The  scanning  patterns  are  defined  by  first  manually  choosing  two 
time-registration  points  for  each  word  in  the  phrase.  These  registration 
points  were  chosen  to  lie  just  before  the  vowel  and  just  after  the  vowel  in 
a region  where  the  spectrum  has  the  most  change.  When  the  two  registration 
points  have  been  located,  a scanning  pattern  is  defined  at  each  of  the  points, 
as  described  in  Section  II.B.l  and  shown  in  Figure  1.  Thus,  each  word  has 
two  scanning  patterns  associated  with  it. 

For  this  experiment,  the  registration  points  are  chosen  manually 
for  the  first  four  phrases  of  the  enrollment.  Preliminary  scanning  patterns 
are  then  defined  which  are  used  to  scan  the  remainder  of  the  enrollment 
phrases  to  mark  the  registration  points.  As  each  phrase  is  marked  in  order, 
a new  scanning  pattern  for  each  word  in  the  phrase  is  defined  and  averaged 
with  the  previous  patterns  for  that  word. 

2 . Pi tch 

Once  the  registration  points  have  been  marked,  the  pitch  period  is 
extracted.  Since  the  time  registration  for  the  words  is  chosen  around  the 
vowels,  most  of  the  region  between  the  registration  points  should  be  voiced, 
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depending  on  the  exact  location  of  the  registration  points.  The  pitch  period 
was  then  extracted  only  between  the  two  registration  points  for  each  word. 


The  recognition  patterns  for  each  word  were  defined  by  choosing  six 
columns  of  the  spectrum,  plus  pitch  period,  interpolated  between  the  two 
marked  time-registration  points  for  that  word.  Again,  five  patterns  are 
summed  to  obtain  the  reference  recognition  patterns  for  each  word.  These 
recognition  patterns  will  be  used  to  compute  a spectral  and  a pitch-period 
error  for  verification  purposes. 


D.  Execution 

In  the  execution  phase  of  the  study,  the  input  data  is  scanned  by  the 
reference  scanning  patterns  belonging  to  the  words  in  the  phrase  under  con- 
sideration. As  described  earlier  in  Section  II. B,  an  optimal  decision 
strategy  is  used  to  choose  a best  set  of  eight  registration  points  for  each 
phrase.  With  the  registration  points  thus  defined,  the  pitch-period  is 
extracted  for  the  points  between  the  registration  points  of  each  word.  The 
pitch-period  was  then  extended  for  five  time  samples  before  the  first  regis- 
tration point  of  the  word  and  five  samples  beyond  the  last  registration 
point.  This  was  to  allow  for  some  variance  in  the  registration  points  when 
using  scanning  patterns  defined  on  various  channel  conditions.  The  input 
data  was  then  formatted  into  recognition  patterns  and  compared  with  the 
reference  patterns  to  form  a spectral  and  pitch-period  error  for  each  phrase. 

E.  Design 

The  enrollment  for  the  experiment  was  performed  on  data  from  the  original 
channel.  The  execution  was  performed  for  data  from  both  the  original  and  the 
degraded  channels.  In  addition,  a test  was  run  with  registration  on  the 
degraded  channel  and  execution  on  the  original  and  the  degraded  channels  to 
determine  which  channel  condition  should  be  used  for  enrollment.  Several 
verification  errors  were  then  calculated  for  each  execution  phrase. 


33 


These  were: 


())  Spectral  error  Es 

(2)  Pitch-period  error 

(3)  Relative  pitch-period  error  E = E - E , where  E 

' ' rp  p p f 

value  of  the  pitch-period  over  the  phrase. 

In  addition,  two  weighted  sum  errors  were  calculated: 

(4)  E + W E 

s p 

(5)  E + W E , where  W is  a weighting  constant. 

' ' s rp 


is  the  mean 


The  above  error  measures  were  used  in  several  multiple  phrase  strategies. 
The  definition  of  the  one,  two,  three,  and  four  phrase  strategies  is  given  in 
Table  9.  In  this  table,  the  Ej  represents  the  error  measure  (this  could  be 
any  one  of  the  five  measures  given  above;  example,  spectral  error)  for  the 
first  phrase  of  the  four  phrases  of  each  execution  session;  E2  is  the  same 
error  measure  for  the  second  phrase  of  the  execution  session;  E^  is  the  error 
measure  for  the  third  phrase;  and  E^  is  the  error  measure  for  the  fourth  phrase. 


TABLE  9.  MULTIPLE  PHRASE  STRATEGIES 
One  phrase  strategy:  E^  = E^;  E2  = E2 ; E^  = = E^ 


Two  phrase  strategy: 


- E1  + E2  . f E2  + E3.  F S + E4. 

E1  - 2 ’ b2  = 2 ’ L3  2 


E,  = 


E4  + E1 


4 2 


Three  phrase  strategy: 


^ E1  + E2  + E3.  r E2  + E3  + E4. 
E1  " 3 ’ 2 - 3 


- e3  + E4  + E1  - E4  + E1  + E2 
E3  = 3 ’ 4 = 3 


Four  phrase  strategy: 


E1  = E2  " E3  = E4 


E 1 + E2  + E3  + E4 
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SECTION  VII 
RESULTS 


j 


The  results  of  the  limited-data  test  were  very  encouraging.  Using  a 
four  phrase  strategy  with  a weighted  sum  error  of  the  spectral  and  relative 
pitch-period  errors,  a one  percent  true  speaker  rejection  rate  and  one  per- 
cent impostor  acceptance  rate  were  obtained.  All  of  the  true  speaker  data 
was  able  to  be  time-registered;  however,  some  of  these  were  misregistrations. 
The  results  will  be  discussed  in  more  detail  in  the  following  sections. 

A.  Time  Registration 

All  of  the  true-speaker  data  was  able  to  be  time-registered.  However, 
it  was  determined  that  several  of  the  phrases  were  registered  at  incorrect 
locations.  Out  of  256  phrases  of  true-speaker  data,  four  phrases  were  grossly 
mi s reg i s tered  on  at  least  one  word  and  ten  more  phrases  were  judged  to  have 
minor  misregistrations.  An  example  of  a minor  misregistration  is  shown  in 
Figure  8 for  the  word  "sing"  in  the  phrase  "huge  toads  sing  deep."  Good 
registration  points  for  the  word  "sing"  would  be  at  time  points  179  and  192 
which  were  obtained  when  the  enrollment  and  execution  were  both  on  the 
degraded  channel.  When  the  enrollment  was  on  the  original  channel  and  the 
execution  on  the  degraded  channel,  the  registration  points  were  at  172  and 
192.  This  was  the  fourth  execution  session  of  speaker  No.  3.  A possible 
explanation  for  the  misregistration  can  be  seen  by  looking  at  the  enrollment 
of  the  word  "sing"  for  speaker  No.  3 over  the  original  and  degraded  channels. 
Figure  9 shows  the  spectrum  for  the  word  "sing"  on  the  original  channel.  The 
first  registration  point  for  the  word  was  chosen  at  time  sample  168.  Notice 
the  lack  of  energy  just  before  this  registration  point.  Comparing  this  to 
the  registration  over  the  degraded  channel  shown  in  Figure  10,  it  is  apparent 
that  considerable  energy  is  present  in  front  of  the  registration  point  at 
175.  This  energy  has  been  added  by  the  4A  line  and  is  thought  to  be  due  to 
the  aliasing  that  is  present.  As  mentioned  previously,  the  DICEF  facility 
did  not  low-pass  filter  the  data  before  sampling,  which  caused  aliasing  to 
occur.  It  is  not  felt  that  the  additional  energy  is  due  to  the  noise  since 
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Execution  Registration  of  "Sing" 
Enroll  - Original;  Execute  - Degraded 
Enroll  - Degraded;  Execute  - Degraded 
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Figure  8.  Time  Registration  of  the  Word  "Sing"  on  a Degraded  Channel 
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Figure  9.  Enrollment  Time  Registration  of  "Sing"  on  Original  Channel 
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Figure  10  Enrollment  Time  Registration  of  "Sing"  on  Degraded  Channel 
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that  seems  to  occur  more  or  less  uniformly  throughout  the  word.  It  was 
thought  at  first  that  the  aliasing  would  not  cause  a problem  since  the  verifi- 
cation is  done  on  the  vowel  that  is  relatively  low  frequency.  However,  the 
registration  points  are  placed  at  rapidly  changing  points  of  the  spectrum 
and,  in  this  case  and  several  others,  the  registration  is  between  either  a 
fricative  or  stop  consonant  and  the  vowel.  The  high  frequency  content  of 
the  fricative  or  stop  causes  the  problem. 

Also,  in  analyzing  the  results  of  the  registration,  it  became  apparent 
that  the  pitch-period  could  be  used  as  a method  of  eliminating  gross  misregis- 
trations. To  see  this,  consider  that  the  registration  points  are  chosen  about 
the  vowels  which  are  voiced.  These  registration  points  are  chosen  just  before 
and  just  after  the  vowel  so  the  pitch  track  may  not  be  stable  at  the  beginning 
and  the  end;  however,  most  of  the  interval  between  the  registration  points 
should  be  voiced.  It  has  been  found  that  if  little  voicing  is  present  in  the 
interval,  registration  points  are  badly  placed  and  the  pitch-period  error 
is  usually  very  large.  Therefore,  by  requiring  that  voicing  occur  over  some 
percentage  of  the  interval,  such  as  60%,  it  can  be  assumed  that  the  regis- 
tration is  probably  correct.  The  voicing  decision  would  be  very  easy  to 
calculate  at  the  time  the  pitch  is  being  extracted. 

B.  Verification 

The  first  result  of  interest  is  the  determination  of  the  channel  condition 
to  use  for  enrollment.  Table  10  gives  the  results  for  the  spectral  error  and 
the  pitch-period  error  of  enrolling  on  the  original  channel  and  executing  on 
both  the  original  and  degraded  channels.  Likewise,  the  results  of  enrolling 
on  the  degraded  channel  with  execution  on  both  the  original  and  degraded 
channels  are  given.  The  results  are  given  in  the  form  of  the  equal  error 
rate,  which  is  the  rate  at  which  the  true  speaker  rejection  and  impostor 
acceptance  are  equal.  As  can  be  seen  from  the  table,  if  execution  is  always 
to  be  over  the  same  line  as  the  enrollment,  then  that  line  should  be  used  for 
enrollment.  However,  if  any  cross-channel  conditions  are  to  be  encountered 
(i.e.,  enrollment  on  one  channel  condition  and  execution  over  a different 
channel),  the  results  indicate  the  best  overall  performance  is  obtained  by 
enrolling  on  the  original  channel. 
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Hence,  the  enrollments  on  the  remainder  of  the  experiment  were  performed 
using  the  original  channel  conditions.  The  results  for  the  remainder  of  the 
experiment  also  include  the  multiple  phrase  strategies  discussed  in  Section 
VI.  E. 


TABLE  10.  EQUAL  ERROR  RATES  FOR  ENROLLMENT  ON 
ORIGINAL  AND  DEGRADED  CHANNELS 


Attribute 

Enrol  1 

Execute 

Equal  Error  Rate 

Spectral  Error  Espectra] 

Original 

Original 

5.0 

Original 

Degraded 

6.4 

Degraded 

Original 

6.8 

Degraded 

Degraded 

6.  1 

Pitch- Period  E . 

r pitch 

Original 

Original 

25.5 

Error  r 

Original 

Degraded 

26.2 

Degraded 

Original 

26.7 

Degraded 

Degraded 

25.2 

The  equal  error  rate  results  for  the  spectral  error  are  shown  in  Table  11. 

The  14-channel  results  given  at  the  top  of  the  table  are  taken  from  the  Speaker 

3 

Verification  II  Report.  The  10-channel  results  were  obtained  in  the  current 

experiment.  The  differences  in  the  results  are  not  entirely  due  to  the  number 

of  channels  of  spectral  data  used.  The  14-channel  results  were  obtained  from 

an  experiment  using  the  entire  BISS  Phase  I data  set  and  processing  developed 

for  the  BISS  system.  These  are  described  in  the  Speaker  Verification  II 

3 

Report.  The  10-channel  results  were  obtained  using  a subset  of  the  BISS 
Phase  I data  set  and  processing  described  in  Sections  VI. A and  V,  respectively, 
of  this  report. 

The  decision  function  distribution  using  the  spectral  error  in  a one- 
phrase  strategy  is  given  in  Figure  11.  The  figure  compares  the  distributions 
obtained  from  executing  over  the  original  channel  and  the  degraded  channel. 
Figure  12  shows  the  decision  function  distributions  of  the  spectral  error 
for  a four-phrase  strategy.  Again,  the  comparison  is  of  the  distributions 
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Pr  (Error) 


Pr  (Error' 


obtained  from  executing  on  the  original  channel  and  on  the  degraded  channel. 
Note  that  the  BISS  specifications  of  one  percent  true  speaker  rejection  and 
two  percent  impostor  acceptance  are  not  met  with  the  spectral  error  when 
executing  over  a degraded  channel. 


TABLE  11.  EQUAL  ERROR  RATE  FOR  MULT  I PHRASE  STRATEGIES 
USING  THE  SPECTRAL  ERROR 


Attr i bute  Enrol lment 

Spectral  Error  Original 

( 14-channe  1 ) 

Spectral  Error  Original 

( 1 0-channe  1 ) 


Spectral  Error  Original 
( 1 0-channe 1 ) 


Execut ion 
Original 

Original 

Degraded 


# Phrases 
1 

2 

3 

4 

1 

2 

3 

4 

1 

2 

3 

4 


Equal  Error  Rate 

4.2 
1 .6 
0.9 
0.5 

5.0 

1.5 

1.0 

0.3 

6.4 

2.7 

1 .6 
1.6 


The  equal  error  rates  for  the  pitch-period  error  and  the  relative 
pitch-period  error  are  given  in  Table  12.  There  are  several  items  of  particu- 
lar interest  in  this  table.  First,  the  performance  for  the  pitch-period  error 
on  the  original  and  degraded  channels  is  approximately  equivalent,  as  was 
expected.  However,  this  doesn't  seem  to  hold  true  for  the  relative  pitch- 
period  error.  This  was  not  due  to  any  failing  of  the  relative  pitch-period 
error,  but  because  three  of  the  sequences  used  were  grossly  m i s reg is tered . 

This  caused  a very  large  error  to  occur  in  the  relative  pitch-period,  but 
not  as  severe  an  error  in  the  pitch-period.  The  problems  of  misregistration 
were  covered  in  the  previous  section. 
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TABLE  12. 

EQUAL  ERROR  RATE  FOR  MULTI  PHRASE  STRATEGIES  USING 
THE  PITCH-PERIOD  AND  RELATIVE  PITCH-PERIOD  ERRORS 

Attr i bute 

Enrol lment 

Execut i on 

# Phrases  Equal 

Error 

P i tch- Per iod 

Original 

Original 

1 

25.5 

Error 

2 

20.8 

3 

19.2 

4 

17-2 

Pi tch- Per i od 

Original 

Degraded 

1 

26.2 

Error 

2 

22.0 

3 

19.  1 

4 

18.4 

Relative  Pi tch- 

Or igina) 

Or ig i na 1 

1 

18.9 

Period  Error 

2 

13.4 

3 

9.2 

4 

6.  1 

Relative  Pi tch- 

Original 

Degraded 

1 

19.3 

Period  Error 

2 

14.9 

3 

15.5 

4 

13.8 

A second  item  of  interest  in  the  table  is  that  the  relative  pitch-period 
error  gave  better  overall  performance  than  did  the  pitch-period  error.  This 
is  because  the  relative  pitch-period  error  approximates  the  performance  of 

4 

the  pitch-period  slope.  Ooddington  found  that  better  performance  is  obtained 
with  the  pitch-period  slope  than  with  the  pitch-period. 

4 

A third  item  of  interest  also  relates  to  the  results  of  Doddington. 

In  the  experiments  using  pitch-period  slope,  he  achieved  an  equal  error  rate 
of  six  percent  on  a one-phrase  strategy  as  opposed  to  the  l?/  obtained  here. 
The  various  reasons  for  this  are  discussed  below. 

(a)  In  his  experiments,  Doddington  adapted  the  speaker's  pitch- 
period  slope  over  as  many  as  ten  sessions  with  two-day  lapses  between 
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sessions.  In  the  experiment  conducted  here,  the  enrollment  on  the  pitch- 
period  was  accomplished  in  one  session.  The  pitch-period  can  vary  from 
enrollment  to  execution  as  shown  in  Figure  13,  where  a plot  of  the  average 
pitch-period  for  several  words  is  plotted  versus  the  occurrence  of  the  word. 
The  first  five  occurrences  were  during  enrollment  and  the  pitch-periods  are 
relatively  stable  for  the  two  speakers.  However,  in  the  next  four  occurrences 
of  the  word,  which  were  during  the  execution  sessions,  a considerable  varia- 
tion in  the  pitch  period  is  observed.  This  variation  was  probably  due  to 
"mike  fright"  and  would  dissipate  as  the  speaker  became  familiar  with  the 
system.  This  variation  problem  can  be  overcome  with  an  adaptation  of  the 
pitch-period  over  several  sessions. 

(b)  The  phrase  used  in  Doddington's  experiment  was  a single  phrase, 

"We  were  away  a year  ago";  whereas,  the  phrases  used  here  are  a random  assort- 
ment of  four  monosyllabic  words.  The  pitch-period  obtained  for  any  one  word 
in  the  phrase  might  change  depending  on  the  two  words  immediately  preceding 
and  following.  The  enrollment  sessions  did  not  allow  all  of  the  transitions 
of  the  words  to  occur.  This  problem  can  also  be  overcome  with  adaptation 
of  the  speaker's  pitch  period. 

(c)  The  phrases  used  in  the  current  experiment  were  prompted.  This 
could  have  an  adverse  effect  on  performance  in  that  some  of  the  speakers 
might  change  their  manner  of  speaking  to  accommodate  the  prompting.  Adapt- 
ing the  speaker's  pitch-period  would  also  tend  to  alleviate  this  problem. 

The  decision  function  distributions  using  the  pitch-period  error  in  a 
four-phrase  strategy  are  plotted  in  Figure  14,  and  the  decision  function 
distributions  using  the  relative  pitch-period  error  in  one  and  four-phrase 
strategies  are  given  in  Figure  15  and  16,  respectively.  In  all  three  figures, 
the  distributions  are  compared  for  execution  over  the  original  and  degraded 
channel  conditions. 

Two  weighted  sum  errors  were  then  calculated  for  various  values  of  a 
weighting  constant  W.  These  weighted  sum  errors  were  the  spectral  error 
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Occurrence  of  Word 


Pr  (Error) 


Pr  (Error) 


Pr  (Error) 


plus  the  pitch-period  error 


E = E + W E 
sp  s p 


and  the  spectral  error  plus  the  relative  pitch-period  error 


E = E + W E 
srp  s rp 


A typical  curve  of  the  equal  error  rate,  EER,  versus  the  weighting 

constant  W is  shown  in  Figure  17  for  the  particular  case  of  the  spectral 

error  plus  the  relative  pitch-period  error  E with  the  enrollment  on  the 

srp 

original  channel,  execution  on  a degraded  channel,  and  various  multiphrase 
strategies.  Notice  that  the  optimum  weighting  W ^ which  is  the  minimum  of 
the  curves,  varies  considerably  from  one  strategy  to  another.  This  randomness 
of  the  optimum  weighting  constant  for  the  various  strategies  is  probably 
due  to  the  limited  data  set  used  in  the  experiment. 

Table  13  shows  the  equal  error  rate  for  the  two  weighted  sum  errors 

E and  E . The  equal  error  rates  shown  were  obtained  by  usinq  the 
sp  srp 

optimal  weighting  constant  W ^ for  each  of  the  multiphrase  strategies.  As 
can  be  seen,  the  combination  of  the  spectral  error  with  the  relative  pitch- 
period  error  gave  better  results  than  did  the  spectral  error  with  the  pitch- 
period  error.  Also,  the  performance  of  the  weighted  sum  errors  was  worse  on 
the  degraded  channel  than  on  the  original  channel.  Part  of  this  degradation 
in  performance  was  due  to  the  misregistration  of  the  phrases  as  mentioned 
earl  ier. 


The  decision  function  distributions  using  the  spectral  error  plus  the 
relative  pitch-period  error  with  a weighting  of  W = 0.95  in  a four-phrase 
strategy  is  shown  in  Figure  18.  A comparison  is  shown  of  the  distributions 
for  execution  with  the  degraded  and  the  orignal  channel  conditions.  By 
choosing  a decision  threshold  which  is  such  that  215  < < 220,  it  can 

be  seen  that  a performance  of  better  than  a one  percent  true  speaker  rejec- 
tion and  a one  percent  impostor  acceptance  is  obtained  for  execution  on  both 
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the  original  and  degraded  channels.  Thus  it  has  been  shown  that  for  the 
limited  experiment  conducted  here,  the  BISS  specifications  of  one  percent 
true  speaker  rejection  and  two  percent  impostor  acceptance  were  met. 


TABLE  13.  EQUAL  ERROR  RATE  FOR  MULT  I PHRASE  STRATEGIES 
USING  WEIGHTED  SUM  ERRORS 


Attribute 

Enrol lment 

Execution 

# Phrases 

Equal  Error 
Rate 

We i qht i nq 

Spectral  + W 

Original 

Original 

1 

4.7 

0.35 

(Pi tch-Per iod 

2 

1.3 

0.55 

Error):  E 

sp 

3 

1.0 

0.0 

4 

0.2 

0.05 

Spectral  + W 

Original 

Degraded 

1 

6.0 

0.65 

(Pi tch-Per iod 

2 

2.4 

0.4 

Error):  Esp 

3 

1.5 

0.  1 

4 

1.6 

0.65 

Spectral  + W 

Original 

Original 

1 

3.9 

0.5 

(Relative 

2 

1.0 

0.8 

Pi tch-Per iod 

3 

0.5 

1.8 

Error):  E 

' srp 

4 

0.0 

0.25 

Spectral  + V 

Original 

Degraded 

1 

5.8 

0.95 

(Relative 

2 

2.4 

0.2 

Pi tch-Per iod 

3 

1.4 

0.  1 

Error):  E 

' cm 

4 

0.6 

0.95 
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True  Speakers 


Enrollment:  Original  Channel 

Execution:  Original  _ — — 

Degraded  


Threshold 

Figure  18.  Decision  Function  Distributions  Using  Spectral  Plus  Relative  Pitch-Period 
Error  with  W = 0.95  in  Four  Phrase  Strategy 
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SECTION  VIII 
SUMMARY 


A study  was  conducted  to  develop  a speaker  verification  system  for  use 
over  degraded  communication  channels.  A test  of  the  current  speaker  verifica- 
tion technology  was  performed  on  a degraded  channel  and  compensation  tech- 
niques were  then  developed,  i.e.,  a band-limited  spectrum  for  time  registra- 
tion and  the  addition  of  the  pitch-period  error  to  the  spectral  error  for 
speaker  discrimination.  This  configuration  was  tested  with  a limited  data 
set,  and  the  BISS  specifications  of  one  percent  true  speaker  rejection  and 
two  percent  impostor  acceptance  were  met  with  a four-phrase  strategy  employ- 
ing the  spectral  error,  plus  the  relative  pitch-period  error. 

A.  Conclusions 

It  appears  tnst  an  operational  remote  terminal  speaker  verification  sys- 
tem utilizing  a telephone  line  for  a communication  channel  will  meet  the  BISS 
specifications  of  one  percent  true  speaker  rejection  and  two  percent  impostor 
acceptance.  This  system  would  employ  the  channel  compensation  techniques 
developed  in  this  study  along  with  the  following  modifications. 

(1)  Implementation  of  a voicing  requirement  such  that  some  percentage 
of  the  interval  between  the  time  registration  points  of  each  word  must  be 
voiced,  e.g.,  60%  of  the  interval  must  be  voiced  or  it  is  assumed  that  the 
word  is  misregistered.  Had  this  voicing  requirement  been  used  in  the  present 
limited  experiment,  several  misregistered  phrases  would  have  been  rejected 
and  the  ultimate  performance  of  the  four-phrase  strategy  using  the  spectral 
error  plus  the  relative  pitch-period  error  would  have  resulted  in  the  com- 
plete separation  of  the  true  speaker  and  impostor  errors. 

(2)  Adaptation  of  the  relative  pitch-period  over  several  sessions 
beyond  the  enrollment  before  it  is  weighted  very  heavily  with  the  spectral 
error.  The  need  for  this  adaptation  is  seen  in  Figure  13  where  the  average 
pitch-period  for  two  words  from  each  of  two  speakers  is  plotted  against  the 
occurrences  of  the  words.  The  first  five  occurrences  of  each  word  are  during 


the  enrollment  session  and  a consistent  pitch-period  is  observed  for  each 
word.  However,  during  the  next  four  occurrences,  which  are  the  execution 
sessions,  a wide  variance  of  pitch  periods  is  observed.  Averaging  the  pitch- 
period  over  the  execution  sessions,  as  well  as  over  the  enrollment,  would 
result  in  a much  better  reference  pitch-period  for  each  word.  This  would 
then  lead  to  much  improved  performance  of  the  weighted  sum  error  for  speaker 
d i scr im i na  t ion. 

(3)  Normalization  of  the  spectral  error  by  the  noise  energy  from 
regions  of  the  spectrum  in  which  the  speaker  is  silent.  This  would  tend  to 
stabilize  the  spectral  error  from  the  original  to  the  degraded  channel  and 
would  make  it  easier  to  pick  a decision  threshold. 

It  can  be  seen  from  Figure  18  that  a decision  threshold  D^  which  is 
such  that  215  < D-j-  < 220,  results  in  one  percent  true  speaker  rejection  and 
one  percent  impostor  acceptance  for  execution  over  both  original  and  degraded 
channel  conditions.  This  performance  was  achieved  using  a four-phrase 
strategy  with  the  spectral  error  plus  the  relative  pitch-period  error.  The 
problem  of  choosing  a decision  threshold  was  not  addressed  to  any  great 
extent,  however,  because  of  the  limited  data  set  used  in  the  experiment.  To 
set  the  decision  thresholds  for  an  operational  system  requires  a larger 
exper iment. 

B.  Telephone  Experiment 

To  obtain  a preliminary  evaluation  over  an  actual  telephone,  Dan  Daniel 
of  Texas  Instruments  (a  participant  in  the  original  BISS  Phase  I experiment) 
was  asked  to  enroll  over  a telephone  in  the  laboratory  and  to  use  four 
different  office  telephones  in  the  laboratory  for  execution  sessions.  Eight 
phrases  of  execution  were  performed  in  each  of  the  four  offices  by  dialing  an 
inside  line  (Centrex)  and  also  by  dialing  an  outside  line  (dial  "9").  In 
one  of  the  offices,  he  was  also  asked  to  call  the  Texas  Instruments  plant 
at  Austin,  Texas,  and  have  them  patch  him  back  to  the  data  phone  in  the  lab. 
He  also  performed  an  execution  session  in  the  sound  booth  as  a reference, 
since  his  original  data  was  collected  in  the  sound  booth.  The  results  of  the 
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eight  phrases  of  execution  were  averaged  and  the  results  are  aiven  in  Table 
14  showing  the  average  spectral  and  pitch-period  errors.  The  variations  in 
the  performance  between  the  inside  lines  and  the  outside  lines  is  due  to  the 
different  switching  equipment  being  used  for  the  two  lines.  The  variations 
observed  from  office  to  office  using  only  the  inside  line  or  the  outside 
line,  however,  were  due  to  the  variations  in  the  telephone  headset  micro- 
phones. This  is  true  because  all  of  the  offices  are  connected  to  the  same 
telephone  line  through  a "rotary"  system.  Thus,  if  an  inside  line  is  called 
from  office  1 and  2,  the  same  line  and  switching  network  are  being  used,  but 
the  headset  is  different. 

A very  encouraging  aspect  of  this  limited  experiment  was  that  all  of  the 
execution  phrases  were  correctly  time-registered. 

C.  Recommendations  for  Future  Work 

To  verify  the  performance  of  the  remote  terminal  speaker  verification 
system  developed  in  this  study,  a larger  simulated  telephone  experiment  should 
be  conducted  using  the  4a  line  with  a 20  dB  s igna 1 -to-no ise  ratio.  The 
system  to  be  tested  would  include  the  modifications  discussed  in  Section 
VIII. A,  along  with  the  channel  compensation  techniques  developed  in  this 
study.  Before  this  experiment  can  be  conducted,  however,  the  following  two 
preliminary  tasks  must  be  completed. 

(1)  A method  of  automatically  enrolling  the  speakers  on  the  system 
must  be  developed. 

(2)  A real  time  pitch  tracker  must  be  implemented.  An  array  processor 
will  be  installed  on  the  laboratory  system  in  midyear  1 977  which  will  allow 
this  task  to  be  accomplished. 

In  addition  to  an  evaluation  of  the  performance  of  the  speaker  verifica- 
tion system,  the  simulated  telephone  experiment  should  provide  a set  of 
decision  thresholds  and  a verification  strategy. 
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Upon  the  completion  and  evaluation  of  the  simulated  telephone  experi- 
ment, another  experiment  should  be  conducted  over  actual  telephone  lines. 

This  would  be  a two-fold  experiment.  The  first  phase  would  consist  of  a 
limited  scale  experiment  to  study  the  headset  microphone  problem.  This 
experiment  would  be  conducted  using  various  telephones  with  both  the  existing 
microphones  and  with  these  microphones  replaced  by  high  quality  microphones. 
Depending  on  the  outcome  of  this  limited  experiment,  a large  scale  experiment 
would  be  conducted  over  the  telephone  channels  with  either  the  existing  or 
high  quality  microphones. 


TABLE  14.  DAN  DANIEL  TELEPHONE  EXPERIMENT 


Dial  9 - Office  4 132.4  2.5  134.9 

Austin-  Office  4 ~ 133.5  4.5  138.0 
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APPEND  IX 

COMPARISON  OF  BISS  AND  REMOTE  TERMINAL  SPEAKER  VERIFICATION  SYSTEM  PROCESSING 


The  differences  between  the  BISS  automatic  speaker 
and  the  remote  terminal  speaker  verification  system  are 
append i x . 


verification 
presented  in 


system 
th  i s 


A.  Preprocessing 


1 . 

Filter  Bank 

Def i n i t i on 

BISS  (D ig  i 

ta  1 ) 

Remote  Terminal 

(Ana loq) 

Channe 1 

Center  Freq.  Bandwidth 

Channe 1 

Center  Freq 

Bandwidth 

No. 

(Hz) 

(Hz) 

No. 

(Hz) 

(Hz) 

1 

355 

220 

1 

350 
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O 

CO 

1 

2 

530 

220 

2 

450 
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3 
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220 

3 
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4 

880 
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4 

670 

340 

5 

1055 

220 

5 

790 

O 

OO 

CO 

6 

1230 

220 

6 

940 

400 

7 

1405 

220 

7 

1 120 

400 

8 

1580 

220 

8 

1320 

400 

9 

1755 

220 

9 

1550 

440 

10 

1930 

220 

10 

1810 

400 

1 1 

2105 

220 

1 1 

2 100 

400 

12 

2280 

220 

12 

2420 
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13 

2455 

220 

13 

2800 

400 

14 

2630 
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14 

3200 

400 

15 

2805 
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15 

3800 
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BISS 

where  {p^}  are  discretized, 
legend  re  polynomials. 


Remote  Terminal 
where 


k = 0,  1,2 


fi0  ■ 1 


i 2 


-cos 

'=  1,  ....  14 


3 . Norma  1 i za  t ion 


BISS 


(Ai}R 

(A  ) - J_k 

J N Hj 

14 
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is  the  average  value. 


Remote  Terminal 
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H pos  t 


n = j - 8 j + 8 


14 


°Post.  =i_(S  am!  " S O 

J 
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is  the  standard  deviation. 

By  normalizing  by  the  standard  deviation  rather  than  the  mean,  the  resistance 
to  noise  has  been  increased  by  approximately  10  dB.^ 

4.  Quantizat ion 

The  method  of  quantization  for  the  BISS  and  remote  terminal  systems 
is  the  same.  The  regressed  and  normalized  vectors  are  quantized  to  one  of 
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eight  levels  with  the  quantization  threshold  being  computed  to  provide  a 
uniform  first-order  distribution,  i.e.: 

Prl(aij)N  e(®i,q’  di,q+l)]  = i f°r  a1'  q‘ 


B.  Processing 

I . Scanning  Pattern  Definition 
B ISS 

The  scanning  patterns  are 
formed  from  the  spectrum;  the 
scanning  pattern  at  time,  t., 
consisting  of  six  vectors  of 
spectral  data: 

(Aj-5’  Aj -3 ’ Aj-r  Aj  + 1 ’ 


Aj+3 ’ Aj+5} 


Remote  Terminal 

The  scanning  patterns  are  formed 
from  the  scanning  spectrum:  the 
scanning  pattern  formed  at  time,  t . , 
consisting  of  (a)  the  average  of 

spectral  data  at  times,  t and 

/v  J " 1 

tj-2’  k the  avera9e  spectral 

data  at  times,  t.  , and  t.  „ ; and 
j+1  j +2 

(c)  the  difference  (b)  - (a). 


Verification  Pattern  Definition 


BISS 


Remote  Terminal 


The  verification  pattern  is  The  verification  pattern  is  six 
just  the  scanning  pattern.  The  columns  of  spectral  data  which  is 


scanning  error  is  used  as  the 
verification  error.  This  can 
be  thought  of  as  a zeroeth 
order  time  warping  to  account 
for  time  alignment. 


interpolated  between  two  time  regis- 
tration points.  This  is  equivalent 
to  a first  order  time  warping  in 
that  it  helps  account  for  changes 
in  the  length  of  words  in  addition 
to  time  alignment. 


Decision  Function 


BISS 


d..  = 


N 4 
LEE 
k=l  i=l 


ik 


N 4 „ 

min [max  ( E E E..  , 4NE  . ),  4NE 

, ...  ik  mm  max 

k=l  i=l 
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where  EJk  is  the  scanning  error,  Ejk  is  the  expected  scanning  error 
Emin  = '00’  and  Emax  = '**0. 

Remote  Terminal 

dN  = E where  E is  the  squared  error  between  the  reference  and  input 
verification  patterns. 


A decision  strategy  was  not  developed  for  the  remote  terminal  study  due  to  the 
limited  experiment  which  was  conducted. 


METRIC  SYSTEM 


1 


BASE  UNITS: 


Quantity 

Unit 

SI  Symbol 

Formula 

length 

metre 

m 

mass 

kilogram 

kg 

time 

second 

s 

electric  current 

ampere 

A 

thermodynamic  temperature 

kelvin 

K 

amount  of  substance 

mole 

mol 

luminous  intensity 

candela 

cd 

SUPPLEMENTARY  UNITS: 

plane  angle 

radian 

rad 

solid  angle 

steradian 

sr 

DERIVED  UNITS: 

Acceleration 

metre  per  second  squared 

m/s 

activity  (of  a radioactive  source) 

disintegration  per  second 

(disintearationl/s 

angular  acceleration 

radian  per  second  squared 

rad/s 

angular  velocity 

radian  per  second 

rad's 

area 

square  metre 

m 

density 

kilogram  per  cubic  metre 

kg/m 

electric  capacitance 

farad 

F 

A-s/V 

electrical  conduc  tance 

siemens 

S 

A/V 

electric  field  strength 

volt  per  metre 

V/m 

electric:  inductance 

henry 

H 

V-s/A 

electric  potential  difference 

volt 

V 

W/A 

electric:  resistanc  e 

ohm 

V/A 

electromotive  force 

volt 

V 

W/A 

ene.  gy 

joule 

i 

N-m 

entropy 

joule  per  kelvin 

|/K 

force 

newton 

N 

kg.  m/s 

frequency 

hertz 

Hz 

(cycle)/s 

illuminance 

lux 

lx 

Im/m 

luminance 

candela  per  square  metre 

cd/m 

luminous  flux 

lumen 

Im 

cd-sr 

magnetic  field  strength 

ampere  per  metre 

A/m 

magnetic  flux 

weber 

Wb 

Vs 

magnetic  flux  density 

tesla 

T 

Wb/m 

magnetomotive  force 

ampere 

A 

power 

watt 

W 

)/s 

pressure 

past  al 

Pa 

N'm 

quantity  of  electricity 

coulomb 

C 

A-s 

quantity  of  heat 

joule 

1 

N-m 

radiant  intensity 

watt  per  steradian 

W/sr 

specific  heat 

joule  per  kilogram  kelvin 

J/kg-K 

stress 

pasc  a 1 

Pa 

N/m 

thermal  conduc  tivity 

watt  p**r  metre-kelvin 

W/m-K 

velocity 

metre  per  sec  ond 

m/s 

visc  osity,  dynamic. 

pasc  al-secnnd 

Pa-s 

viscosity,  kinematic 

square-  metre  per  second 

ms 

voltage 

volt 

V 

W'A 

volume 

cubic  metre 

m 

wavenumber 

reciprocal  metre 

(wava)/m 

work 

joule 

1 

N-m 

SI  PREFIXES: 

Multiplication  Factors 

Prefix 

SI  Symbol 

1 000  000  000  000 

= 10u 

tera 

T 

1 000  000  000 

= 10* 

Riga 

(I 

1 000  000 

= 10* 

mega 

M 

1 000 

= !0’ 

kilo 

k 

100 

* 10* 

her.to' 

h 

10 

= 10' 

deka* 

da 

0 1 

10-  * 

ded* 

d 

0 01 

ioJ 

c enti* 

c 

0 001 

= 10- * 

milli 

m 

0 000  001 

= 10"* 

micro 

M 

0.000  000  001 

= 10*’ 

nano 

n 

0.000  000  000  001 

= 10-  ,J 

pico 

p 

0 000  000  000  000  001 

* 10“  19 

fnmto 

f 

0 000  000  000  000  000  001 

= 10-'" 

atto 

a 

■ To  bo  avoided  whom  poasibN' 


I 


MISSION 
of 

Rome  Air  Development  Center 


RADC  plans  and  conducts  research , exploratory  and  advanced 
development  programs  in  command,  control,  and  communications 
(C^ ) activities , and  in  the  C ^ areas  of  information  sciences 
and  intelligence . The  principal  technical  mission  areas 
are  communications,  electromagnetic  guidance  and  control, 
surveillance  of  ground  and  aerospace  objects,  intelligence 
data  collection  and  handling,  information  system  technology , 
ionospheric  propagation,  solid  state  sciences , microwave 
physics  and  electronic  reliability , maintainability  and 
compatibility . 
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